Upgrade AI PR reviewer model: gpt-5.4 → gpt-5.5 (CI + local)#522
Conversation
- .github/workflows/ai_pr_review.yml: CI Codex reviewer model -> gpt-5.5 (effort xhigh / read-only sandbox unchanged) - .claude/scripts/openai_review.py: DEFAULT_MODEL -> gpt-5.5; add gpt-5.5 to _is_reasoning_model (api-backend timeout/token classification only) - .claude/commands/ai-review-local.md: update default-model + reasoning- model references - CHANGELOG: [Unreleased] entry No PRICING entry: CI + local reviewer run gpt-5.5 via the codex backend (subscription/flat-rate), which does not consult the api-backend cost table. Validated before the swap via tools/reviewer-eval/ A/B (gpt-5.5 >= gpt-5.4 on every test-backed recall case incl. a bug buried in a ~3k-line diff, 0 new false positives, faster) and an end-to-end CI canary (action CLI 0.135.0 runs gpt-5.5 and caught a planted P0). gpt-5.4 remains accepted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. The PR does not touch estimator methodology or statistical outputs. One P2 cost-visibility/documentation issue should be fixed, but it is not a blocker. Executive Summary
MethodologyNo findings. The diff only changes reviewer tooling/docs/workflow/changelog files. No estimator implementation, math, weighting, variance/SE, inference fields, assumptions, or default statistical behavior changed, so Code QualityFinding 1 — P2: API backend loses cost visibility for the new default model
PerformanceNo findings. MaintainabilityNo findings. The model default and reasoning-model handling are narrowly scoped and consistent across the changed script/workflow paths. Tech DebtFinding 2 — P3 informational: workflow contract coverage remains tracked
SecurityNo findings. The workflow continues to use Documentation/TestsFinding 3 — P3: local docs understate API-backend cost after the model change
Verification
|
Add gpt-5.5 ($5/$30) and gpt-5.5-pro ($30/$180) to PRICING at OpenAI's
confirmed standard rates (developers.openai.com/api/docs/pricing). The
production reviewer uses the flat-rate codex backend, but `--backend auto`
falls back to the metered API path when the codex CLI is unavailable —
there estimate_cost("gpt-5.5") previously returned None and silently
dropped the cost lines. gpt-5.5-pro gets its own entry so it doesn't
prefix-fall-back to the standard rate. Also bumps the api-backend cost
range note (gpt-5.5 ~2x gpt-5.4) and the CHANGELOG wording.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. The previous cost-visibility issue is substantially addressed, and the PR does not touch causal estimator methodology or statistical output. Executive Summary
MethodologyNo findings. Affected methods: none. The diff is limited to reviewer tooling/docs/workflow/changelog files and does not change any causal estimator, mathematical formula, weighting scheme, variance/SE computation, inference field, or identification assumption. Code QualityFinding 1 — P2: GPT-5.5 API cost estimate misses long-context pricing
PerformanceNo findings. MaintainabilityNo findings. The model default, CI model, Codex command construction, timeout logic, and pricing table are narrowly scoped and consistent. Tech DebtFinding 2 — P3 informational: workflow contract test expansion remains tracked
SecurityNo findings. The workflow continues to use Documentation/TestsFinding 3 — P3: new
Verification performed:
|
What
Upgrades the Codex PR-reviewer model from gpt-5.4 → gpt-5.5 for both the CI reviewer and the local
/ai-review-localdefault (codex backend;effort: xhigh/read-onlysandbox unchanged)..github/workflows/ai_pr_review.yml— CIopenai/codex-action@v1model: gpt-5.5.claude/scripts/openai_review.py—DEFAULT_MODEL = "gpt-5.5";gpt-5.5added to_is_reasoning_model.claude/commands/ai-review-local.md— default-model + reasoning-model doc referencesCHANGELOG.md—[Unreleased]entryNo
PRICINGentry: the reviewer runsgpt-5.5via the codex backend (subscription / flat-rate), which doesn't consult the api-backend cost table.gpt-5.4remains accepted.Why — validated before the swap
Decided empirically, not by assumption (the prior upgrade regressed by missing real issues while changing model + prompt at once):
tools/reviewer-eval/): on a real-bug corpus + a k=6 big-diff de-risk, gpt-5.5 matched-or-beat gpt-5.4 on every test-backed recall case — including a bug buried in a ~3k-line methodology diff — with zero new false positives and faster latency. (The one early apparent divergence was traced to a weak, non-test-backed ground truth, i.e. noise.)openai/codex-action@v1, bundled codex CLI v0.135.0) runs gpt-5.5 (verified from the run-log model banner) and caught the planted P0.Notes for review
ready-for-cilabel to run the test matrix and become mergeable (CI Gate).